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Abstract 

Autoimmune disease is a disorder of immune system due to the over-reaction of 
lymphocytes against one's own body tissues. Anti-Nuclear Antibody (ANA) is an 
autoantibody produced by the immune system directed against the self body tissues 
or cells, which plays an important role in the diagnosis of autoimmune diseases. 
Indirect ImmunoFluorescence (IIP) method with HEp-2 cells provides the major 
screening method to detect ANA for the diagnosis of autoimmune diseases. 
Fluorescence patterns at present are usually examined laboriously by experienced 
physicians through manually inspecting the slides with the help of a microscope, 
which usually suffers from inter-observer variability that limits its reproducibility. 
Previous researches only provided simple segmentation methods and criterions for 
cell segmentation and recognition, but a fully automatic framework for the 
segmentation and recognition of HEp-2 cells had never been reported before. This 
study proposes a method based on the watershed algorithm to automatically detect 
the HEp-2 cells with different patterns. The experimental results show that the 
segmentation performance of the proposed method is satisfactory when evaluated 
with percent volume overlap (PVO: 89%). The classification performance using a SVM 
classifier designed based on the features calculated from the segmented cells 
achieves an average accuracy of 96.90%, which outperforms other methods 
presented in previous studies. The proposed method can be used to develop a 
computer-aided system to assist the physicians in the diagnosis of auto-immune 
diseases. 



Introduction 

The immune system enables us to resist infections by counteracting invading organisms. 
Autoimmune disease is a disorder of immune system due to over-reaction of lympho- 
cytes against one's own body tissues [1]. Common autoimmune diseases include 
Hashimoto's thyroiditis, rheumatoid arthritis, diabetes mellitus type 1, and lupus 
erythematosus. Anti-Nuclear Antibody (ANA) is an autoantibody produced by the 
immune system directed against the self body tissues or cells. The ANA test widely 
used to detect antibody in the blood plays an important role in the diagnosis of auto- 
immune diseases. When a particular antibody pattern has been detected, the patient 
may have the possibility of acquiring certain autoimmune diseases. 
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Indirect ImmunoFluorescence (IIF) technique applied on HEp-2 cell substrates 
provides the major screening method to detect ANA patterns in the diagnosis of auto- 
immune diseases. It produces the ANA images with distinct fluorescence intensities 
and staining patterns through IIF slides. Currently, the ANA patterns are inspected by 
experienced physicians to identify abnormal cell patterns, which is a laborious task and 
may cause harm to physicians' eyes. It is not easy to train a qualified physician in a 
short term. Furthermore, manual inspection suffers from the difficulties, such as intra- 
and inter-observer variability, that limit the reproducibility of IIF readings [2-5]. 

Although previous studies have proposed several methods for automatic segmenta- 
tion of ANA cells [6,7] and criteria for recognition of cell patterns [3,6,8-10], a fully 
automatic segmentation and recognition framework has never been developed so far. 
In this study, we propose a framework based on the watershed approaches to automa- 
tically segment the HEp-2 cells. It is a crucial preprocessing step for a computer aided 
system to classify the cell patterns to provide information to assist physicians in disease 
diagnosis and treatment. 

Since the cytoplasm of HEp-2 cells is invisible in the IIF images, in what follows, the 
term "cell" means cell nucleus, "foreground" indicates the cell region, and "background" 
denotes the rest of the image. The rest of this paper is organized as follows. Section 
"Related Works" reviews the techniques used for ANA image segmentation and cell 
recognition in previous studies. Section "Segmentation of ANA Cells" describes the 
methods proposed in this study for the segmentation of ANA cells. Classification of 
ANA cell patterns is demonstrated in section "Cell Classification of ANA Images". 
Finally, discussions, conclusions, and future works are made in sections "Discussion" and 
"Conclusion and Future Work". 

Related works 

In this section, the methods proposed in previous investigations for the segmentation 
and classification of ANA cell images are presented. 

ANA image segmentation 

Perner et al. [6] used image processing techniques, including image transformation, his- 
togram equalization, Otsu thresholding [11], and morphological operation, to obtain a 
binary mask for segmenting the cells from the ANA images. By modifying the methods, 
Huang et al. [7] presented two adaptive automatic segmentation firameworks to precisely 
extract the ANA cells. In their studies, the first framework classified an image into two 
categories, i.e., sparse and mass cell regions, based on the number of connected regions. 
Depending on the category of the images, different color spaces and processing techni- 
ques were adopted for cell segmentation. Morphological operations were also used to 
obtain smooth segmentation results. It was demonstrated to be able to deal with the seg- 
mentation of different patterns of IIF images. On the other hand, in the second frame- 
work, watershed segmentation [12] was applied on the green channel of the RGB 
images, followed by region merging and elimination to obtain the cell boundaries. If the 
number of regions in the obtained image was larger than a pre-defined threshold, the 
framework converted the original image into CMY color space and performed marker- 
controlled watershed segmentation [13] on the cyan color component. It was reported 
that the segmentation performance achieved an overall sensitivity of 94.7%. 
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Creemers et al. [14] proposed a unsupervised segmentation algorithm, based on 
iterative global Otsu thresholding and morphological opening operation, to support IIF 
testing. It was reported to have the capability to split connected regions into individual 
regions with an average accuracy of 89.57%. 

ANA cell recognition 

Perner [8] presented the first study on fluorescent image analysis, feature extraction 
and classification. Then, an automatic cell recognition approach based on a variety of 
features, including size, color density, and number of cells, extracted from the segmen- 
ted images was proposed [6]. For the cells with identical color density, features includ- 
ing standard deviation, mean shape factor, mean of perimeter, and standard deviation 
of perimeter, were further extracted. Data mining techniques, including Boolean model 
and decision tree induction, were then used to label the cell regions. Finally, human 
experts tagged each labeled region with a semantic label. Based on the aforementioned 
methods. Sack et al. [3] presented a system to automatic classify HEp-2 fluorescent 
patterns with a classification accuracy greater than 83%. 

According to the fluorescence intensity. Soda and lannello [9] classified the ANA 
images into a variety of patterns. They further proposed a framework consisting of 
hybrid rule-based multi-expert systems for the classification of ANA patterns with an 
overall error rate of 2.7-5.8% [15]. The framework extracted the features including the 
first, second, and fourth moments of the gray-level co-occurrence matrix, Zernike 
moments, as well as the coefficients of discrete cosine transform (DCT) and discrete 
wavelet transform (DWT). Based on the efforts of previous researches, Rigon et al. 
[16] proposed a comprehensive system based on two approaches, in which the first 
approach discriminated the positive cells from the negative and weakly positive cells 
based on the features of fluorescence intensity, whereas the second one recognized the 
staining pattern of the positive cells. The performance of positive/negative recognition 
ranges from 87% to more than 94%, whereas the staining pattern classification accu- 
racy of the main classes, i.e. homogeneous cells, peripheral nuclear cells, speckled cells, 
nucleolar cells, and artefacts, ranges from 71% to 74%. 

Elbischger et al. [17] developed an iterative thresholding algorithm for processing 
HEp-2 cells and a cell classifier for detecting auto-immune diseases. Features including 
area to perimeter ratio, variance, 30th and 60th normalized percentiles, percentile 
range, dent number, auto-covariance percentage, and roundness, were extracted from 
the segmented cells and used for cell classification. The system was reported to be cap- 
able of distinguishing 5 different patterns with an overall accuracy of 93% based on the 
dataset consisting of 982 ROIs extracted from 38 images. 

Recently, Huang et al. [18] employed the self-organizing map (SOM) to identify the 
fluorescence patterns of HEp-2 cells. Fourteen features, including the perimeter, area, 
and histogram uniformity of the cell; area and average intensity of the inside and peri- 
meter areas of the cell; higher and lower intensity ratios of the inside area, perimeter 
area, and whole area of the cell; and standard deviation of the inside area of the cell, 
were used for designing the classifier with an average accuracy of 92.4%. In [19], the 
EUROPattern designed based on k-nearest neighbor algorithm, was compared with the 
conventional visual IIF evaluation with the sensitivity and specificity achieving 100% 
and 97.5%, respectively. In addition, it was shown that 94.0% of all the main antibody 
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patterns, including the positive patterns, i.e., homogenous, speckled, nucleolar, centro- 
mere, nuclear dotted, and cytoplasmic patterns, as well as the negative patterns, could 
be correctly recognized. 

Segmentation of ANA cells 

As recommended by Center for Disease Control (CDC) [20,21], in this study, the IIF 
slides were prepared at 1:80 serum dilution, and the ANA images were acquired by a 
digital camera mounted on a fluorescence microscope at a 40-fold zoom. The images 
were stored with the format of 24-bit RGB color depth and a resolution of 3136x2352 
pixels. As shown in Figure 1, the ANA cells are classified into six categories: diffused, 
peripheral, nucleolar, coarse-speckled, fine-speckled, and discrete-speckled patterns. A 
dataset, consisting of 196 images classified into 37 diffused, 29 peripheral, 5 nucleolar, 
94 coarse-speckled, 1 fine-speckled, and 30 discrete-speckled images by an expert (Dr. 
Hsieh), was used for the experiments. The procedure of the proposed method is illu- 
strated in Figure 2. 

Since the original images are stained by green dye, the proposed method extracts 
only the green channel from the original RGB ANA images for processing. In order 
to reduce the computation time, the images are downsized from 3136x2352 to 
1024x768 pixels. It was found that images at this resolution can still provide enough 
information for the segmentation and classification of cell patterns. Figure 3 shows 
an example of ANA image and its corresponding green-channel. As described in the 
following 3 subsections, the proposed segmentation method divided into 3 proce- 
dures, i.e. pre-classification, cell detection 1, and cell detection 2, is presented. The 
parameters used for these 3 procedures are described in subsection "Parameters for 
cell segmentation". Finally, the segmentation results are demonstrated in subsection 
"Segmentation results". 









• 
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(a) 


(b) 


(c) 












(d) (e) (f) 

Figure 1 Example of ANA images. Images classified into (a) diffused, (b) peripheral, (c) nucleolar, (d) 
coarse-speckled, (e) fine-speckled, and (f) discrete-speckled patterns. 
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Original images: RGB 
with 3136x2352 
resolution 



Dataset Image: Green 
channel image 
downsized to 1024x768 



Cell Detection 1 



Cell Detection 2 



Pre-classification 




Remove noises 



Extract foreground by applying 
Otsu thresholding twice 



Extract foreground by applying 
Otsu thresholding once 



Figure 2 Procedure of the proposed method 



I 




Pre-classification 

Automatic segmentation of ANA images cannot be handled in a unified way because the 
characteristics of the images in different categories are quite dissimilar. For example, the 
discrete speckled cells look like irregular broken blobs and are significantly different 
from the cells of other 5 categories that appear as elliptic blobs but still having diverse 
appearances (cf. Figure 1). Thus, the images are pre-classified according to their differ- 
ences in image patterns before conducting cell segmentation. In the pre-classification 
stage, the images are divided into two groups. Images with larger grey-level variance or 
more regions contained in the foreground are assigned to the first group, and the rest of 
the images are assigned to the second group. The images in these two groups are seg- 
mented using different methods as detailed in subsections "Cell detection 1" and "Cell 
detection 2". The procedure of pre-classification is summarized as follows: 

1) First, Otsu thresholding algorithm is used to roughly separate the foreground 
regions from the background. 

2) The closing morphological operation is employed to fill the holes and to eliminate 
small regions in the foreground. 

3) If the number of foreground regions in an image is larger than the threshold, 
th_num, or its foreground regions contain staining noises with variance higher than 
the threshold, th^g_var, it is segmented using "Cell detection 1"; otherwise "Cell 
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detection 2" is adopted. In this study, the thresholds th_num and th_Jg_var are set to 
200 and 1000, respectively. 

4) For images segmented with "Cell detection 2", the staining noise in the back- 
ground regions are removed according to their noise level, as defined in the following 
equation: 

128 

Eb® >o], 

!=0 

where i indicates the gray level of the image, and p{i) denotes the frequency of gray 
level i in the image. The threshold of noise level, th_noise, is set to 10. 

Cell detection 1 

The approach is designed for cell detection of the images containing more foreground 
regions or the gray level in the foreground regions presenting great variance. It consists 
of two stages: image segmentation and cell extraction. Ceils are extracted according to 
the cell contours obtained from the general watershed segmentation [12] and marker- 
controlled watershed segmentation [13]. As described below, the procedure of this 
approach is divided into four steps. Figure 4 illustrates the results obtained from indivi- 
dual steps. 

1) The histogram equalization is applied to the original image in Figure 4(a), and 
then the pixels with gray level greater than 240 are considered as the initial markers, 
as presented in Figure 4(b). 

2) As demonstrated in Figure 4(c), the original image is smoothed by the morpholo- 
gical opening operation using a disk-shaped structuring element with a radius of 15. 

3) The Difference between the original image and the smoothed image is obtained. 
Figure 4(d), and is converted into a binary image. Figure 4(e), by applying the Otsu 
thresholding method. 

4) The initial markers are superimposed on the thresholded image shown in Figure 4 
(e), followed by applying the same opening morphological operation mentioned in the 
2"'* step to obtain the marker image. Figure 4(f), used for marker-controlled watershed 
segmentation. The flowchart of marker extraction is depicted in Figure 5. 

As described in Steps 5-7, 3 types of watershed images are obtained based on the 
original image, smooth image, and marker image, and are used for cell segmentation. 

5) The original image shown in Figure 4(a) is complemented by subtracting each 
pixel value from 255 for conducting watershed segmentation. Figure 6(a) shows the 
background watershed image (b-ws) superimposed on the original image. 

6) Furthermore, Gaussian differentiation parameter with o=2 and thresholding parameter 
of /z-minima suppression with th hl = 0.12 [22] are applied on the smoothed image for 
conducting watershed segmentation to obtain the foreground watershed segmentation 
image (f-ws). Figure 6(b) presents the "f-ws" image superimposed on the original image. 

7) Similar to the foreground watershed segmentation, the smoothed image is first fil- 
tered by the Gaussian differentiation and minima suppressed by /z-minima transform, 
which is then superimposed by the marker image and the "b-ws" image to obtain the fore- 
ground marker-controlled watershed (fmc-ws) image used for marker-controlled 
watershed segmentation. Figure 6(c) shows the foreground marker-controlled watershed 
(finc-ws) image. 
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(e) (f) 

Figure 4 Extraction of mariner images, (a) The original, (b) initial marker, and (c) smooth images, (d) 
Difference image obtained from the original image and the smooth image, (e) Image after performing Otsu 
thresholding on difference image and (f) the marker image after applying opening morphological operation. 



Original Image 



Figure 4(a) 



Histogram 
Equalization 



Intensity > 240 



Initial Marker 



Figure 4(b) 



Morphological opening 



(disk with 15 radius) 



Smooth Image 



Figure 4(c) 



.Morphological Otsu thresholding 

Marker Image j< ^„^ni„„ (gX- 



opening 



Figure 4(f) 



Figure 4(e) 



Difference Image 



Figure 4(d) 



(^"f-ws" Image ^ ^ Watershed 



Figure 6(b) 



Watershed 



(: 



Gaussian Differentiation 



h-minima Transform 



'fmc-ws" image 



"b-ws" Image 

Figure 6(a) 



Watershed 



Figure 6(c) 



Figure 5 Flowchart of marker extraction 
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(a) 


(b) 









(c) 



Figure 6 Examples of three types of watershed images, (a) Background-watershed (b-ws) image, (b) 
foreground-watershed (f-ws) image, and (c) foreground marker-controlled-watershed (fmc-ws) image 
superimposed on the original image. 



These 3 types of watershed images were further used for cell segmentation. 
As demonstrated in Figure 6, it can be observed that the "b-ws" image is effective in 
splitting cells that are close to each other. The blobs in "fmc-ws" are mostly over- 
segmented with unsmooth contours, resulting in a failure to effectively delineate the 
cell contours. On the other hand, the "f-ws" image is unable to detect some of the 
cell regions. Consequently, in the cell extraction stage of "Cell detection 1", the three 
types of watershed images and the marker image are combined to precisely extract 
cell boundaries. 

As illustrated in Figure 7, the strategies for cell extraction using the watershed 
images are described in the following steps: 

1) The three watershed images, i.e., "b-ws", "f-ws" and "fmc-ws", are all binary 
images. The cell contours in "f-ws" and "fmc-ws" images are labeled as ZERO, other- 
wise ONE, followed by the removal of background regions to obtain the watershed 
mask images shown in Figure 8(a) and 8(b), respectively. 

2) The cell regions are extracted from the "fmc-ws" image according to the peri- 
meters of the connected regions, since it can potentially detect more cell regions than 
"f-ws". The regions whose areas larger than the threshold th_area are justified by 
"ellipse test" and considered as the cells after having passed the test. 

3) For the regions with areas smaller than the threshold th_area, closing morpholo- 
gical operation is conducted to merge smaller regions. The merged regions are then 
justified by ellipse test for cell extraction. As demonstrated in Figure 9, the small inner 
regions of the remains are merged to larger regions. 
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Intermediate images obtained 
from waterslned 




Resulting 
Cell 
Image 




Step 2 



fmc-ws 



from individual regions 



Step 3 



Extracted cells after 



f-ws extraction has 
been done 



fmc-ws 



f-ws 



Extract large regions not 



considered as cells 



Step 4 



Extracted cells after 



f-ws extraction has 
been done 



fmc-ws 



f-ws 



Extract regions with 



markers are large 



b-ws 



Step 5 



Figure 7 Procedure of cell extraction in "Cell detection 1". 



4) For the regions, which are not deemed as ellipses from the remains of the "fmc-ws" 
at Step 3 and the "f-ws", having areas higher than th_area and containing markers in 
the corresponding locations, they should be treated as the candidate cells. Due to the 
fact that the blobs of "f-ws" are more similar to real cells than "fmc-ws", "f-ws" is used 
to perform cell extractions before applying "fmc-ws" here. 

5) Most of the cells in "f-ws" and "fmc-ws" should have been extracted at the pre- 
vious 4 steps, but some regions may not be detected because their markers are large 
enough to cover the edges of the regions. Figure 10(a) demonstrates the cells detected 
at Steps 1-4. However, as shown in Figure 10(b), watershed segmentation may fail 
when detecting the cells whose corresponding markers are large enough to cover the 
whole candidate cell. Hence, if the markers existed in the marker-image are larger 
than the threshold th_area1, the watershed segmentation (with a parameter of 
/z-minima transform th_h2) is performed in the corresponding region of the smooth 
image. Here, only the corresponding region of "b-ws", as shown in Figure 10 (c), is 
considered for extracting the cells. 

Due to the fact that the real HEp-2 cells usually appear as ellipses, the cells can be 
justified by "ellipse test". It is used to justify whether a region contains a cell or not. 
Given a region r;, the error between r, and a real ellipse r/ is defined as: 

e; = i-^, with r{ = r, XOR rj and ij = Ellipse(a, b, 0) 
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(a) (b) 

Figure 8 Mask images of cell regions with bacl<ground removal (a) "f-ws" image and (b) "fmc-ws" 
image. 




(a) (b) 

Figure 9 Merging the small Inner regions of a region not justified as a cell, (a) Inner regions which 
are not justified as cells and (b) larger inner regions obtained by merging smaller inner regions with 
morphological closing operation. 



in which |r,| denotes the number of pixels in r, and r/ is the estimated ideal ellipse 
for r,, comprising three parameters: major-length (a), minor-length {b), and orientation 
(9). The lengths of major axis and minor axis are both computed according to the cen- 
troid of r,. If the error function of a region is equal to zero, the region is deemed as an 
ideal ellipse. Figure 11 depicts a region and its estimated ideal ellipse. If the error of a 
region is lower than the threshold th_error, it is marked as a cell; otherwise, it may be 
treated as one of following cases: not a cell, an incomplete cell, or a connected region. 

Cell detection 2 

The approach is applied to the images containing less foreground regions 
{th_num<200) and less staining noise {th^g_var<1000) detected in the foreground. As 
shown in Figure 12, the procedure is very similar to that in "Cell detection 1", except 
that the image segmentation uses only the "b-ws" (red) and "f-ws" (green) watershed 
images without considering the "fmc-ws" image. The procedure is described as follows: 

1) Remove the background regions of the "f-ws" image. 

2) Extract cell regions from "f-ws". 

3) Because of the characteristics of watershed segmentation, the adjacent regions 
form connected regions. The regions which are not extracted in step 2 may be fake 
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(C) 

Figure 10 Extracting cells for regions with large markers. Illustration of (a) 8 detected cells (green) and 
1 undetected cell, (b) the marker corresponding to the b-ws region containing the undetected cell, and (c) 
superimposition of images shown in (a) and (b). 



connected regions, which can be split by using the information embedded in the "b-ws" 
image. As illustrated in Figure 13(a), a sub-region, which connects two watershed 
regions, with a line in the "b-ws" image crossing it will be eliminated, resulting in the 
separation of two cell blobs. Figure 13(b). Subsequently, watershed segmentation (with a 
designated parameter of h-minima transform, th_hl) is further performed on the 




Figure 11 A region r, and its fitted ellipse r/(green). 
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Figure 12 An example of cell segmentation using "Cell Detection 2". 




(a) (b) (c) 

Figure 13 Splitting of a fake sub-region, (a) A fake sub-region with a line in "b-ws" crossing it is split to 
(b) two separated candidate cell regions, (c) Subsequent watershed segmentation is performed on the 
candidate cell regions on "f-ws". 



individual cell regions appeared on "f-ws", Figure 13 (c). The sub-regions in the refined 
cell regions are merged and justified by "ellipse test" afterward. 

4) For connected cell regions which can't be split at step 3, all possible combinations 
of sub-regions will be tested to obtain combinations of sub-regions which are similar 
to ellipses. Once the best combination has been obtained, the cell regions can be well- 
separated from the background. Figure 14 illustrates the procedure in splitting the 
region containing three candidate cell regions. A connected region r, consisting of A/,- 
sub-regions can be indicated as: 

The error function of the ^th combination of sub-regions, comb/^, can be calculated 
according to: 

\comb',\ 

ek = r^/ 

\combk\ 

where comb'i^ = combk XOR combl with comb\ denoting the estimated ideal ellipse of 
comb/;. If the combination with smallest error, ^' = 3rg rnin Ck, has been found, the con- 
nected region can be split to isolated regions r/ accordingly. As shown in Figure 14(c), 
"b-ws" image is superimposed on the "f-ws" image to form 17 sub-regions. The 
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(a) (b) (c) 

Figure 14 Example of splitting a connected region, (a) A connected region is split into 3 cell regions 
by (b) superimposing "b-ws" image on "f-ws" image and then (c) determining combined sub-regions and 
discarded sub-regions. 



combinations with smallest errors include {rij, ri^2> >'i,s> t'i,9}> {i'i,4> fijo> '"/,ij}> {>'i,i4, rijs> 
ri,i6, ^i.i?}' ^n<i {^1,3' ^i,6> ^1,71 ^1,8, Ti.ii}- After the ellipse tests, the sub-region combina- 
tions {1, 2, 5, 9}, {4, 10, 11}, and {14, 15, 16, 17} are merged into 3 cell regions, while 
the combination {3, 6, 7, 8, 12} is discarded. Once a connected region has been split, 
the new regions are modified by performing watershed segmentation (with a desig- 
nated parameter of /j-minima transform, th_h2) in their locations corresponding to the 
locations in the "b-ws" image. The sub-regions {3, 6, 7, 8, 12} are discarded because 
their intensities and textures are very similar to the background when the local 
watershed segmentation has been applied. 

5) Due to the fact that the foreground of dataset images may contain inhomogeneous 
gray levels, some regions can not be detected because they are darker than other regions, 
even though they can be discriminated by human eyes. In order to detect these regions, 
global Otsu thresholding is again performed on the remaining image after cell extrac- 
tion. Detected regions with areas greater than th_area are considered as cells. 

Parameters for cell segmentation 

The parameters used for different stages of cell segmentation are listed in Table 1. The 
parameters th_hl and th_h2 are crucial in effectively suppressing noises and local irre- 
gularities in the gradient images. Furthermore, the segmentation results are very sensi- 
tive to the parameters, even they are only changed slightly; hence they are set case by 
case for obtaining complete blobs and avoiding over-segmentation. If the values are 
too small, the blob will be over-segmented and need more time to find r/, which is the 
combination of sub-regions with the smallest error. In contrast, larger values may 
cause the watershed to reach a boundary outside the blob and cannot converge at the 
real boundaries. The procedures of setting the parameters th_hl and th_h2 are based 
on the greedy algorithm. 



Table 1 Parameters designated for different stages of cell segmentation. 



Pre-classification 


Cell Detection 1 


Cell Detection 2 


Parameter 


Value 


Parameter 


Value 


Parameter 


Value 


th_num 


200 


th_h1 


0.12 


th_h1 


0.12 


th_noise 


10 


th_h2 


0.28 


th_b2 


0.28 


th_fg_var 


1000 


th_area 


400 


th_area 


400 






th_error 


0.095 


th_error 


0.095 






th_area2 


32 
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The parameters, th_area and th_error, are used as the criteria for judging whether a 
blob is a cell or not. Considering an ANA image with a size of 1024x768 pixels, the 
minimum cell size is set to 400 pixels, i.e. th_area = 400, according to the physician's 
opinion. Figure 15 compares the errors among regions with different shapes. Note that 
a perfect ellipse has a zero error. Since HEp-2 cells may be squeezed, superimposed, 
demolished, or deviated from a perfect ellipse, the value of th_error (0.095) is deter- 
mined by greedy algorithm with a grid size of 0.005 to select the optimal threshold 
with best detecting accuracy according to the 3830 ground-truth cell images extracted 
from 196 images in the dataset. On the other hand, the parameter th_area2 is used to 
find the markers located in non-recognized cell regions. In the cases of nucleolar and 
discrete-speckled patterns, the markers could be too small to be used for cell detection. 
Hence, its value is assigned as th_area2 = 32 for mUd restriction. 

Segmentation results 

Figure 16 demonstrates the segmentation results of ANA images with 6 different pat- 
terns. As shown in this figure, the proposed method performs well on almost all the 
images with different cell patterns; however, the performance on images of diffuse and 
discrete-speckled patterns is less satisfactory because the cells of diffused pattern con- 
tain more closely connected regions than the other types of cells, whereas the cells of 
discrete-speckled pattern appear to have less obvious boundaries. Figure 17 compares 
the segmentation results using the proposed method with examples of the ground- 
truth images. The ground-truth images were delineated by the technicians trained by 
one of the authors. Dr. Hsieh. Performance of the segmentation results was evaluated 



«|t|lll 
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(e) (f) 

Figure 16 Examples of segmentation results of ANA images. Segmented cells of images with (a) coarse- 
speckled, (b) diffused, (c) discrete-speckled, (d) fine-speckled, (e) nucleolar, and (f) peripheral patterns, respectively. 



with percent volume overlap (PVO) and percent volume difference (PVD) that had 
been used widely in previous works [23-26]. 

Given two contours, denoted by Q and Cg, obtained from the proposed method and 
the ground truth, respectively, of the segmented image, PVO and PVD can be calcu- 
lated from the following formula: 



pvo(Q,q) = 

PVD{Q,Cg) = 
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(a) (b) 



° C ^^l^ry O CO OO _ ' 



: 0^ ,0° 



o 



(c) (d) 



(e) (f) 

Figure 17 Comparisons of segmentation results between proposed method and ground-truth. 

Segmented results of images with (a) coarse-sped<led, (b) diffused, (c) discrete-speckled, (d) fine-speckled, 
(e) nucleolar, and (f) peripheral patterns overlapped on ground-truth images. 



where V{C) indicates the volume of a contour. Table 2 presents the average perfor- 
mance of the segmentation results evaluated based on PVO and PVD. The results 
show that the proposed method can detect cells accurately in most image cases with 
the PVO greater than 89% and the PVD less than 22%. Even for the most difficult 



Table 2 Comparisons of cell segmentation performance 



Pattern 


Cell Number 


PVO (%) 


PVD (%) 


Coarse speckled 


260 


92.35 


15.31 


Diffuse 


157 


87.89 


24.23 


Discrete speck cci 




/SJJi 




Fine speckled 


67 


91.27 


17.46 


Nucleolar 


175 


91.94 


16.12 


Peripheral 


153 


93.47 


13.06 



Performance evaluated based on average percent volume overlap (PVO) and percent volume difference (PVD). 
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cases appear in cells with discrete-speckled pattern, PVO can still achieve a value over 
75%. As a matter of fact, it is not necessary to segment HEp-2 cells with great accu- 
racy. However, the segmentation results must be good enough to extract features to 
support accurate cell classification, as described in next section. 

Cell classification of ANA images 

By considering the effect of astigmatism, the texture details of the cells which are not 
located at the central field may be lost due to optical aberration. Hence, only the cells 
located in the central field, accounting to 50% of the area from the image center, are 
used for cell classification. A total of 3830 cells extracted from 196 images are classi- 
fied into 6 different patterns, i.e. diffused (599), peripheral (529), nucleolar (94), 
coarse-speckled (1956), fine-speckled (56), and discrete-speckled (596), by an experi- 
enced physician. Dr. Hsieh. The classified cell patterns are adopted as the ground truth 
to verify classification performance of the proposed method. 

Features for cell classification 

For the purpose of finding suitable features to represent the patterns of ANA images, 
conventional and the state-of-the-art features are investigated. The conventional fea- 
tures used to describe patterns include statistics of intensity and texture of blobs. The 
statistics of blob intensity include mean, variance, skewness, and entropy. Tamura fea- 
tures, including coarseness, contrast, and directionality, as well as the Haralick features, 
including contrast, correlation, energy, and homogeneity, obtained from co-occurrence 
matrix (GLCM) with 0, 45, 90, and 135 degrees, are also used to characterize the 
blobs. Furthermore, the most frequently used state-of-the-art features, such as fuzzy 
texture spectrum (FTS) [27,28] and local binary pattern (LBP) [29-31], are also adopted 
for cell classification in this study. 

In addition, by observing the ANA images, a novel feature has been proposed to 
describe the appearance of blobs from the intensity images. As illustrated in Figure 18, 
the features obtained from the perimeters and the central areas of the blobs are differ- 
ent between two different cell patterns, such as the peripheral and nucleolar patterns. 
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It can be distinguished by calculating the intensity difference between the perimeter 
and the central area of a blob according to the following equation: 

where Pavg denotes the average intensity of pixels located at the perimeter of a blob, 
and Cavg indicates the average intensity of the central area with a size of 7x7 pixels. 

By observing images in Figure 1, it can be found that different cell patterns contain a 
variety of regions with different sizes and patterns. For example, although nucleolar 
and discrete-speckled patterns both contain light regions, the number of light regions 
in the cells with discrete-speckled pattern is greater than the nucleolar pattern. In con- 
trast, some dark regions can be observed in the coarse-speckled and fine-speckled pat- 
terns. These are important and useful characteristics to reduce false cases in 
discriminating cells with different patterns. A total of 6 features derived from statistics 
of light and dark regions inside the blobs, including numbers of dark and light regions 
as well as mean and variance of intensity of dark and light regions, are obtained for 
cell discrimination. 

In total, 129 candidate features were used to represent the patterns of individual 

ANA images. As indicated in Table 3, the features were grouped into 11 categories, i.e. 
STATS (3 features), TAMURA (3 features), HARALICK (16 features), FTS (45 fea- 
tures), LR (3 features), DR (3 features), LBP8 (10 features), LBP16 (18 features), LBP24 
(26 features), ENTROPY (1 feature) and DPC (1 feature). 

Design and validation of cell classifier 

Support vector machine (SVM) is a supervised learning method widely used for classi- 
fication of data patterns [32,33]. A special property of SVM is that it can simulta- 
neously minimize the empirical classification error and maximize the geometric margin 
of a classifier. It is a powerful methodology for solving problems in nonlinear classifica- 
tion, function estimation, and density estimation, leading to many applications [34]. 

In this study, SVM classifier was implemented by the LIBSVM tool [35] which sup- 
ports multi-class classification. Radial basis function (RBF) was selected as the kernel 
because of its advantages in mapping samples into a higher dimensional space so that 



Table 3 Categories of features used for cell classification. 



Category 


Features 
No. 


Description 


STATS 


3 


Intensity statistics of blobs: mean, variation, and skewness 


TAMURA 


3 


Coarseness, contrast, and directionality 


HARALICK 


16 


Contrast, correlation, energy, and homogeneity in co-occurrence matrix with degrees 

0,45, 90, and 135. 


FTS 


45 


Fuzzy texture spectrum based on the relative intensity levels among pixels 


LR 


3 


Statistics of light regions: No. of regions as well as mean and variation of intensity 


DR 


3 


Statistics of dark regions: No. of regions as well as mean and variation of intensity 


LBP8 


10 


Ca cula:cd based on 3 ■"■ciglibors ;l"ic c'stancc of one 


LBP16 


18 


Calculated based on 16 neighbors the distance of one 


LBP24 


26 


Calculated based on 24 neighbors the distance of one 


ENTROPY 


1 


Intensity statistic of blobs 


DPC 


1 


Intensity difference between perimeter and central area 


Total 


129 
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it can handle the case when the relation between class labels and attributes is non- 
linear [36]. The optimal combination of penalty parameters, C and / of the RBF kernel, 
were determined by dividing the range 2'^°-2*^° into 21 steps, resulting in a total of 
441 combinations. 

Two experiments for the verification of classification performance of the SVM classi- 
fier were conducted: cross validation (CV) and independent training and testing (ITT). 
For the CV experiment, 5-fold cross-validation was conducted to obtain the optimal 
parameters C and y in the training phase. On the other hand, the images dataset was 
randomly divided into training set and testing set, each containing 50% of the ran- 
domly selected images, for ITT. Again in the training phase, 5-fold cross-validation 
was used to obtain the optimal combination of parameters C and y based on the train- 
ing set. The ITT experiment were repeated for 10 times. 

Table 4 reveals the resulting accuracy obtained from the CV and 10 ITTs with all of 
features presented in Table 3. It mimics that the proposed segmentation method is 
good enough to detect cell contours for extracting features to design a classifier with 
satisfactory classification accuracy. Additionally, one of the objectives of this study is to 
select salient features to represent cell patterns. 

The sequential backward selection (SBS) [37] was frequently used in many 
researches for feature selection. In this study, the SVM-RFE (recursive feature elimi- 
nation) reported to be effective for multi-cluster classification [38], was adopted to 
eliminate unimportant features according to the minimum redundancy and maxi- 
mum relevancy (MRMR) criterion [39]. It was implemented with MIToolbox (Matlab 
version) [26]. As shown in Figure 19, the best average accuracy obtained is 99.76% 
with 60 features selected for designing the classifier in the CV experiment, while it 
achieves 96.90% accuracy for the classifier designed using 124 selected features in the 
ITT experiment. 

Discussion 

Cytology evaluation has been shown to be a safe, efficient and well-established techni- 
que for the diagnoses of many diseases. Its ability to reduce the mortality and morbid- 
ity of cervical cancer through mass screening is the most famous success. Classical 
cytological diagnosis is based on microscopic observation of specialized cells and 

Table 4 Classification accuracies (%) of different cell patterns 

Exp. Iter. All Diffused Peripheral Nucleolar Coarse-speckled Fine-speckled Discr.-speckled 





99.69 


98.33 


100 


98.94 


99.95 


100 


100 


1 


97.65 


93.98 


99.24 


82.98 


98,88 


85,71 


99,33 


2 


9645 


88.96 


98.48 


95.74 


98,57 


71,43 


97,65 


3 


97.23 


93.65 


97.72 


87.23 


98.47 


78,57 


99,66 


4 


9645 


92.31 


98.10 


82.98 


98.36 


78,57 


96,64 


5 


97.13 


94.98 


96.96 


89.36 


9847 


71,43 


98,66 


5 


96.81 


93.98 


99.24 


76.60 


98.26 


85,71 


96,98 


7 


96.81 


94.31 


97.34 


8723 


98.67 


82,14 


95,64 


8 


97.33 


93.65 


98.48 


91.49 


98.98 


78,57 


97,32 


9 


96.50 


90.30 


98.48 


93.62 


9847 


85,71 


95,97 


10 


96.50 


90.64 


98.48 


85.10 


98.36 


82,14 


97,65 


Avg 


96.87 


92.68 


98,25 


87,23 


98,55 


80,00 


97,55 



Classified accuracies of 6 patterns of cells and overall cells achieved for CV and ITT experiments. 
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Figure 19 Comparison of accuracy against different number of features. Features selected using SVM- 
RFE method compared between cross validation (CV) and independent training and testing (ITT) 
experiments. 



qualitative assessment with descriptive criteria, which may result in inconsistent results 
because of subjective variability found in different observers [40]. Recently, automatic 
or semi-automatic computerized systems developed for segmenting and analyzing 
stained cervical cells from Pap smear images are demonstrated to be effective and effi- 
cient to assist pathologists in the diagnosis of abnormal cells [34,41-43] and in the dis- 
crimination of different types of cells [34,44,45] through accurate and objective 
measurements of cell texture and morphology. 

Tracing the cell migration, cell cycle, and cell differentiation from fluorescent micro- 
scopic images through automatic segmentation, classification, and tracking of living 
and cultured cells has also been widely conducted [46-48]. However, an automated 
image analysis system developed to fit a specific type, assay, or image set is hardly 
applicable to different cells acquired from different modalities [49]. Hence, techniques 
used for segmenting cells from visible-light microscopic images may not be directly 
applied in extracting cells from fluorescent microscopic images, whereas techniques 
used for extracting cells in a living cell population from fluorescent microscopic images 
may not be effective for processing IIP images. 

Tested with the 3830 cells extracted from 196 images, the segmentation results show 
that PVO is greater than 89% and PVD is less than 22%. The average classification accu- 
racy achieved in this study is as high as 96.90% (error rate: 3.1%) and 99.69% (error rate: 
0.31%) for CV and ITT experiments, respectively, which outperforms the performance 
reported in previous studies [3,5,6,16-19]. Table 5 compares the cell/image numbers and 
the error rates in classification of this study with other previous investigations. 

Note that the cells included in the database used in the this study are quite different 
from the cells adopted in previous studies, which may induce bias when making com- 
parisons. CellProfiler is a freely available software [49] useful for automatic cell 
segmentation as well as for quick and easy classification and scoring of cells with 
diverse cellular morphologies [48]. Pigure 20 compares the segmented cell examples 
between the proposed method and the CellProfiler. It can be observed that the 
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Table 5 Comparison of error rate between this study and previous investigations. 



Literature Error rate No. of cells/images Validation method 



Perner et al. [6] 


23.60% 


N A/321 


Human expert 




25.00% 


NA/321 


leave-one-out CV 


Sack et al. [3] 


16.91% 


1041/676 


CV 


Elbischger et al. [17] 


9.75% 


982/38 


ITT (1:1) 


Soda & lannello [5] 


5.80% 


573/37 


8-fold CV 


Rigon et al. [15] 


26.00% 


573/37 


8-fold CV 


Huang et al. [18] 


7.60% 


1020/NA 


10-fold CV 


Voigtet al. [19] 


6.00% 


N A/351 


ITT 


This study 


0.31% 


3830/196 


5-fold CV 




3.10% 


3830/196 


ITT (1:1) 



CV indicates cross validation and ITT denotes independent training and testing. 




(a) (b) (c) 

Figure 20 Comparison of segmentation outcome between proposed method and CellProfiler. (a) Original 

images of 6 different ANA patterns and their segmented results using (b) proposed method and (c) CellProfiler. 

. > 



Table 6 Comparisons of number of segmented cells and classification performance between proposed method and CellProfiler evaluated based on PVO, PVD, 
RAE, and IVIHD. 



Pattern Detected Cell Number PVO (%) PVD (%) RAE ± Std. MHD ± Std. 





G. Truth 


Proposed 


CellProfiler 


Proposed 


CellProfiler 


Proposed 


CellProfiler 


Proposed 


CellProfiler 


Proposed 


CellProfiler 


Diffuse 


251 


157 


107 


87.89 


84.75 


24.23 


30.50 


0.6386 ± 042 


0.5248 ± 0.36 


250.90 ± 22842 


510.69 ± 250.56 


Peripheral 


191 


153 


97 


93.47 


7243 


13.06 


55.15 


0.3039 ± 0.35 


0.5120 ± 0.31 


87.15 ± 182.86 


602.36 ± 267.78 


Nucleolar 


285 


175 


101 


91.94 


84.90 


16.12 


30.20 


04777 ± 041 


0.6445 ± 0.33 


157.99 ± 208.70 


560.73 ± 225.77 


Coarse speckled 


363 


260 


150 


92.35 


87.84 


15.31 


24.32 


0.3506 ± 0.36 


0.7038 ± 0.21 


101.90 ± 199.74 


556.02 ± 261.51 


Fine speckled 


117 


67 


36 


91.27 


91.22 


17.46 


17.55 


0.5423 ± 041 


0.8637 ± 0.13 


173.20 ±13.16 


458.46 ± 21.41 


Discrete speckled 


216 


195 


125 


78.03 


69.64 


43.95 


60.71 


0.5457 ± 0.37 


0.7277 ± 0.18 


143.99 ± 202.59 


504.55 ± 242.68 
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proposed method outperforms CellProfiler regarding the individual cells categorized 
into 6 different patterns. 

In addition to PVO and PVD, other evaluation criteria, including, relative foreground 
area error (RAE) [50] and modified Hausdorff distance (MHD) [51], are also used to 
measure the segmentation errors. As can be seen in Table 6, the proposed method 
demonstrates better segmentation performance over the CellProfiler when evaluated 
based on PVO, PVD, RAE, and MHD. In addition, the number of miss-segment cells 
of the proposed method is less than the CellProfiler. 

Conclusion and future work 

In this study, a segmentation method was proposed to detect the boundaries of HEp-2 
cells automatically, and then classification of cell patterns was performed based on the 
selected features. The results show that the proposed method can detect cells correctly 
in most image cases with PVO greater than 89% and PVD less than 22%, whereas the 
best combination of selected features can achieve an average accuracy as high as 
96.90% in discriminating 6 different types of cell patterns. 

More cell images will be included in the dataset for verifying the segmentation 
performance and classification performance in the future. Furthermore, an automatic 
segmentation and classification system with graphical user interface (GUI) will be 
developed for computer-aid diagnosis. In fact, several different ANA patterns can 
appear in a single image, but the segmentation method proposed here only considers 
images with a unique cell pattern. Future works will focus on developing a segmen- 
tation method to extract cells with different patterns appearing in an image. 
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